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Abstract. Optimal design of a Phase I cancer trial can be formulated 
as a stochastic optimization problem. By making use of recent advances 
in approximate dynamic programming to tackle the problem, we de- 
velop an approximation of the Bayesian optimal design. The resulting 
design is a convex combination of a "treatment" design, such as Babb 
et al.'s (1998) escalation with overdose control, and a "learning" design, 
such as Haines et al.'s (2003) c-optimal design, thus directly address- 
ing the treatment versus experimentation dilemma inherent in Phase I 
trials and providing a simple and intuitive design for clinical use. Com- 
putational details are given and the proposed design is compared to 
existing designs in a simulation study. The design can also be readily 
modified to include a first stage that cautiously escalates doses similarly 
to traditional nonpar ametric step-up/down schemes, while validating 
the Bayesian parametric model for the efficient model-based design in 
the second stage. 

Key words and phrases: Dynamic programming, maximum tolerated 
dose, Monte Carlo, rollout, stochastic optimization 



1. INTRODUCTION 

In typical Phase I studies in the development of 
relatively benign drugs, the drug is initiated at low 
doses and subsequently escalated to show safety at 
a level where some positive response occurs, and 
healthy volunteers are used as study subjects. This 
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paradigm does not work for diseases like cancer, for 
which a non-negligible probability of severe toxic re- 
action has to be accepted to give the patient some 
chance of a favorable response to the treatment. 
Moreover, in many such situations, the benefits of 
a new therapy may not be known for a long time 
after enrollment, but toxicities manifest themselves 
in a relatively short time period. Therefore, patients 
(rather than healthy volunteers) are used as study 
subjects, and given the hoped-for (rather than ob- 
served) benefit for them, one aims at an acceptable 
level of toxic response in determining the dose. Cur- 
rent designs for Phase I cancer trials, which are se- 
quential in nature, are an ad hoc attempt to rec- 
oncile the objective of finding a maximum tolerated 
dose (MTD) with stringent ethical demands for pro- 
tecting the study subjects from toxicities in excess 
of what they can tolerate. It treats groups of three 
patients sequentially, starting with the smallest of 
an ordered set of doses. Escalation occurs if no tox- 
icity is observed in all three patients; otherwise an 
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additional three patients are treated at the same 
dose level. If only one of the six patients has toxic- 
ity, escalation again continues; otherwise the trial 
stops, with the lower dose declared as MTD. As 
pointed out by Storer (1989), these designs, com- 
monly referred to as 3-plus-3 designs, are difficult 
to analyze, since even a strict quantitative defini- 
tion of MTD is lacking, "although it should be taken 
to mean some percentile of a tolerance distribution 
with respect to some objective definition of clinical 
toxicity," and the "implicitly intended" percentile 
seems to be the 33rd percentile (related to 2/6). 
Storer (1989) also considered three other "up-and- 
down" sequential designs for quantile estimation in 
the bioassay literature and performed simulation 
studies of their performance in estimating the 33rd 
percentile. Subsequent simulation studies by 
O'Quigley et al. (1990) showed the performance of 
these designs to be "dismal," for which they pro- 
vided the following explanation: "Not only do (these 
designs) not make efficient use of accumulated data, 
they make use of no such data at all, beyond say the 
previous three, or sometimes six, responses." They 
proposed an alternative design, called the continual 
reassessment method (CRM), which uses paramet- 
ric modeling of the dose-response relationship and 
a Bayesian approach to estimate the MTD or, more 
generally, the dose level x such that the probability 
F{x) of a toxic event is p (1/3 in the case of MTD). 

Letting 9 = (a, /?)' and assuming the usual logistic 
model 

(1) F,(x) = l/{l + e-("+'^")} 

for the probability of a toxic response at dose level x, 
the problem of optimal choice of n dose levels to esti- 
mate the MTD seems to be covered by the theory of 
nonlinear designs. A well-known difficulty in nonlin- 
ear design theory is that the optimal design for pa- 
rameter estimation involves the unknown parameter 
vector. To circumvent the difficulty, it has been pro- 
posed that the design be constructed sequentially, 
using observations made to date to estimate 9 by 
maximum likelihood and choosing the next design 
point by using the MLE to replace the unknown 
parameter value in the optimal design; see Fedorov 
(1972). If 9 is known, then a target probability p 
of response is attained at the level xg that solves 
Fe{x0) =p, that is, xq = [log(p/(l -p)) - a]//3. Wu 
(1985) proposed to use at stage t -|- 1 the certainty 
equivalence (or plug-in) level x^ , where 9t is the 
MLE of 9 based on {xi,yi), 1 < i < i, and yi is the 



binary response at dose level Xi. Using some approx- 
imations, he also derived a recursive representation 
of Xg^ and showed that it is asymptotically equiva- 
lent (as t — >■ oo) to the adaptive stochastic approx- 
imation rule of Lai and Robbins (1979). The like- 
lihood version of CRM proposed by O'Quigley and 
Shen (1996) in response to the comments of Korn 
et al. (1994) on Bayesian designs is in fact a variant 
of Wu's (1985) design. Babb et al. (1998) pointed 
out that the symmetric nature of squared error loss 
used by CRM may not be appropriate for modeling 
the toxic response to a cancer treatment. They pro- 
posed the escalation with overdose control (EWOC) 
method, which uses an asymmetric linear loss func- 
tion that penalizes dose level x = MTD -|- 5 for 5 > 0, 
corresponding to an overdose, more than an under- 
dose X = MTD — 5. Whereas CRM is equivalent to 
estimating the MTD at each stage by the mean of 
the posterior distribution of xq, EWOC is equivalent 
to estimating the MTD at each stage by the wth 
quantile of the posterior distribution of xo, where 
w G (0, 1/2) is the so-called feasibility bound, usually 
chosen to be slightly less than p. There has also been 
much work on designs intended to give an accurate 
post-experiment estimate of the MTD or other func- 
tions of the unknown parameter vector 9. For exam- 
ple, locally optimal designs such as c- and D-optimal 
designs have been investigated extensively for bi- 
nary responses, and because a nonlinear model's in- 
formation matrix for binary data is a function of 
the unknown parameters, locally optimal designs are 
usually applied using initial estimates, multistage 
methods or Bayesian priors. Haines, Perevozskaya 
and Rosenberger (2003) proposed a two-stage design 
whose first stage is a locally optimal design based 
on a chosen prior, which is then updated sequen- 
tially during its second stage. A comparative study 
of these methods is given in Section 4. 

Since the parameter 9 is unknown, active statisti- 
cal learning involves setting the doses at levels that 
give maximum information about the function of 
the unknown parameters of interest, the MTD, and 
how to do this is a problem in nonlinear experimen- 
tal design theory (Abdelbasit and Plackett, 1983; 
Dette et al., 2004). On the other hand, there is also 
an ethical issue of treating patients in the Phase I 
trial at dose levels below the unknown MTD for 
safety, and hopefully close to the MTD for efficacy. 
This dilemma between treatment of current patients 
and efficient experimentation to gather information 
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for future patients was articulated by Lai and Rob- 
bins (1979) in a simple linear regression model yk = 
a + /Sxfc + E/j, where, instead of the MTD, the de- 
sired level is {y* — a)//3, for some given value y* . 
Whereas an asymptotic theory of how this dilemma 
can be resolved optimally as n — > oo was developed 
by Lai and Robbins (1979), it was quite recent that 
a tractable scheme was developed by Han, Lai and 
Spivakovsky (2006) to compute an approximately 
optimal solution for finite sample size n (number of 
patients enrolled in the trial). 

In Section 2 we introduce a basic stochastic opti- 
mization problem that incorporates the treatment 
versus experimentation dilemma in the design of 
Phase I cancer trials. This problem adopts a Bayesian 
formulation as in CRM and EWOC, for which the 
computation of the posterior distributions of the pa- 
rameters and of the MTD is described in Section 2. 
Because the regression function F0{x) = EQ{y\x) for 
the binary response y given by (1) is nonlinear in the 
parameters, the stochastic optimization problem is 
considerably more difficult than the linear regression 
model E{y\x) = a + /3x considered by Lai and Rob- 
bins (1979). We review in Section 2 recent advances 
in the field of approximate dynamic programming, 
which we use in Section 3 to develop a new tool for 
tackling the stochastic optimization problem. Using 
this tool, we derive nearly optimal hybrid designs in 
Section 3. These hybrid designs are convex combina- 
tions (and therefore hybrids) of designs that are tar- 
geted toward treating the current patient at the best 
guess of the MTD (e.g., EWOC and CRM) and the 
Haines-Perevozskaya-Rosenberger designs that are 
D- or c-optimal in estimating the model parameters 
for future patients. The weights in these convex com- 
binations are determined by approximate dynamic 
programming and can be conveniently stored to pro- 
vide simple table look-up schemes for the clinical 
user, as noted in Section 5 which gives some conclud- 
ing remarks. Section 4 provides a comparative study 
of the hybrid design and previous designs. It also in- 
troduces a modified hybrid design that incorporates 
the traditional nonparametric step- up /down scheme 
as a cautious first stage, followed by the model-based 
design in the second stage of a Phase I cancer trial. 

2. STOCHASTIC OPTIMIZATION RELATED 
TO THE TREATMENT VERSUS 
EXPERIMENTATION DILEMMA 

To begin with, we specify a prior distribution on 
6 by following Babb et al. (1998) who first specify a 



range [xmin> Xraa.x\ of possiblc dose values believed to 
contain the MTD, with Xmin believed to be a conser- 
vative starting value. Rather than directly specify- 
ing the prior distribution vr for the unknown param- 
eter 9 of the working model to be used in the second 
stage, which may be hard for investigators to do in 
practice, an upper bound q > on the probability 
p = -F6»(a;mm) of toxicity at Xmin can be elicited from 
investigators; uniform distributions over [xmin, a^max] 
and [0, q] are then taken as the prior distributions for 
the MTD and Fg{xrain), respectively. Let Tk denote 
the information set generated by the first k doses 
and responses, that is, by {xi,yi), . . . ,{xk,yk)- Let- 
ting rj denote the MTD, it is convenient to trans- 
form from the unknown parameters (a,/3) in the 
two-parameter logistic model (1) to {p,rj) via the 
formulas 

a;minlog(l/p- 1) -r?log(l/p- 1) 



a 



(3) /? 



log(l/p - 1) - log(l/p - 1) 



giving 



(4) 



a + I3x = {{x — rj) log(l//3 — 1) 

- (x - Xmin) log(l/p - 1)) 



= 'il){x,p,r]). 

Assuming that the joint prior distribution of {p,"/]) 
has density 7r(p, r]) with support on [0, q] x [xmin, a^max 
the Jfc-posterior distribution of (p, r/) has density 

f{p,v\J^k) 



(5) 



i=l 



1 



1 _|_ ^~tl>{xi,p,r]) 



1 



where 



C 



^min ^ 



1 _)_ giiixi,p,ri) 
k 

n 



Vi 



i=l 



1 



I _|_ ^-ip{xi,p,ri) 



Vi 



1 _|_ ^i){xi,p,vi) 

X 7r(p, rj) dpdrj 

is the normalizing constant. The marginal J-fc-posterior 
distribution of rj is then 

(6) f{rim= r f{pMTk)dp. 

Jo 
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The aforementioned CRM and EWOC doses based 
on J-fc are the mean and the a;-quantile of (6). 

2.1 A Global Risk Function and Its Minimization 

Note that using EWOC or CRM amounts to the 
"myopic" pohcy of dosing the {k + l)th patient at 
the dose x^+i = x that minimizes £'[/i(x, in 
which 



(7) hix,v) 



[X — Vj)'^ 

uj{r] — x)~^ 

+ {l-uj){x-ri)+ 



for CRM, 



for EWOC, 



where x~^ = max(x, 0) and 



E[h{x,r])\Tk] 



h{x,r])f{r]\Tk)dr]. 



Since the information about the dose-toxicity rela- 
tionship gained from Xk+i and the response yk+i 
affects the abihty to safely and effectively dose the 
other patients k + 2,k + 3, . . . ,n, one potential weak- 
ness of these myopic policies is that they may be 
inadequate in generating information on 6 for treat- 
ing the rest of the patients, as well as the post- 
experimental estimate of the MTD for subsequent 
phases. To incorporate these considerations in a 
Phase I trial, xi,X2, ■ ■ ■ ,Xn should be chosen sequen- 
tially in such a way as to minimize the global risk 



(8) 



E 



^h{xi,r]) + g{ri,r]) 



in which the expectation is taken over the joint dis- 
tribution of {p,'q;xi,yi, . . . ,Xn,yn)- Note that (8) 
measures the effect of the dose Xk on the kth. patient 
through /i(xfc,r/), its effect on future patients in the 
trial through X^iLfc+i ^(^«' ^^^^ effect on the 
post-trial estimate fj through g{fi,rj). It can there- 
fore be used to address the dilemma between safe 
treatment of current patients in the study and effi- 
cient experimentation to gather information about 
rj for future patients. As noted in Section 1, Lai and 
Robbins (1979) have introduced a similar global risk 
function to address the dilemma between informa- 
tion and control in the choice of Xk in the linear 
regression model yk = a + j3xk + £k so that the out- 
puts yt, I < k <n, are as close as possible to some 
target value y* . Specifically, they consider (8) with 
g = and h{x; a, /3) = (a -|- /3x — y*)^. 

Dynamic programming is a standard approach to 
a stochastic optimization problem of the form (8). 



Define 



(9) hk{x) 



( E[h{x,r])\Tk], 0<k<n-l, 
E[h{x, rf) 

+ g{fi{xi, . . . ,Xn-l,x),ri)\Tn-l] 
k = n — 1. 



To minimize (8), dynamic programming solves for 
the optimal desig * by backward induction 

that determines x^ by minimizing 



(10) hk^iix) + E 



hi-iix*) 



=k+l 



k-l,Xk 



after determining the future dose levels x^.^^^, . . . , x* . 
Note that (10) involves computing the conditional 
expectation of XlILfc+i ' ^) given the dose x 

at stage k and the information set J-k-i, and that 
x*j^ is determined by minimizing such conditional ex- 
pectation over all x. For i > /c -|- 1, since x* is a com- 
plicated nonlinear function of the past observations 
and of yk,xlj^^,yk+i, ■ ■ ■ ,x*_^,yi-i that are not yet 
observed, evaluation of the aforementioned condi- 
tional expectation is a formidable task. To overcome 
this difficulty, we use recent advances in approxi- 
mate dynamic programming, which we first review 
and then extend and modify for the problem of min- 
imizing the global risk (8). 

2.2 Rollout Algorithms 

To begin with, consider the problem of minimiz- 
ing (8) with g' = and h{x;a,(3) = (a -|- /3x — y*)^ 
in the linear regression model yk = ex + /3xk + £k 
with i.i.d. normal errors Ej having mean 0. Assum- 
ing a normal prior distribution of {a,f3), the pos- 
terior distribution of (a,/?) given Ti-i is also bi- 
variate normal with parameters -E'j_i(a), £'i_i(/3), 
E,_i{a'^),E,_i{/3^),Ei_i{a- P), in which Ei_i de- 
notes conditional expectation given J-i-i. These con- 
ditional moments have explicit recursive formulas; 
see Section 4 of Han, Lai and Spivakovsky (2006). 
The myopic policy that chooses x at stage i to min- 
imize E[{a + fix — y*)'^\Fi-i\ is given explicitly by 



Xi = Ei^i{{y* -a)p}/Ei^i{(3^) 

= {y*E,^m - E,^,{aP)}/Ei^i{p^). 



Although the myopic policy is suboptimal for the 
global risk function (8), Han, Lai and Spivakovsky 
(2006) use it as a substitute for the intractable x* 
for k + 1 < i < n in (10), in which the conditional 
expectation can then be evaluated by Monte Carlo 
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simulation. This method is called rollout in approx- 
imate dynamic programming. The idea is to ap- 
proximate the optimal policy by minimizing (10) 
with 2;^,+i, . . . jX'^ replaced by some known base pol- 
icy Xk+i, ■ ■ . ,Xn, which ideally is some easily com- 
puted policy that is not far from the optimum. Specif- 
ically, given a base policy x = [xi, . . . ,Xn), let 
be the x that minimizes 



(12) hk-iix) + E 



hi-i{xi) 



J^k-l,Xk 



=fc+i 



and the expectation in the second term in (12) is 
typically evaluated by Monte Carlo simulation. The 
policy x*^^) = {x^^\ . . . ,Xn^) is called the rollout of 
X and has been used for stochastic control problems 
arising in a variety of applications; see Section 2.1 of 
Han, Lai and Spivakovsky (2006). The rollout x^^^ 
may itself be used as a base policy, yielding x^^^ , and, 
in theory, this process may be repeated an arbitrary 
number of times, yielding x^-^^ , x^^) , x^^^ , . . . . Letting 
R{x) = ^EILi hi-i{xi)], Bayard (1991) showed that, 
regardless of the base policy, rolling out n times 
yields the optimal design and that rolling out al- 
ways improves the base design, that is, that 



(13) 



i?(i)>i?(i(^))>i?(i(2))>... 
>i2(x(")) = i?(x*) 



for any policy x, where x* denotes the optimal pol- 
icy 

For the global risk function (8) associated with 
Phase I designs, with h given by (7), one can use the 
myopic design EWOC or CRM as the base design 
in the rollout procedure. In contrast with the ex- 
plicit formula (11) for the case of a linear regression 
model with normal errors £t, the posterior distribu- 
tion with density function (5) does not have finite- 
dimensional sufficient statistics and the myopic de- 
sign involves (a) bivariate numerical integration to 
evaluate 

E[hi{xi+i)\Tk^i,Xk = x] 

for i > k, and (b) minimization of the conditional 
expectation over x. The simulation studies in 4, in 
which the rollout is implemented with EWOC as the 
base design, show substantial improvements of the 
rollout over EWOC and CRM. Although (13) says 
that rolling out a base design can improve it and 
rolling out n times yields the dynamic programming 
solution, in practice, it is difficult to use a rollout 



(which is defined by a backward induction algorithm 
that involves Monte Carlo simulations followed by 
numerical optimization at every stage) as the base 
policy for another rollout. To overcome this diffi- 
culty, we need a tractable representation of succes- 
sive rollouts, which we develop by using other ideas 
from approximate dynamic programming (ADP). 

2.3 Combining Least Squares with Monte Carlo 
in ADP 

The conditional expectation in (10), func- 
tion of X, is called the cost-to-go function in dy- 
namic programming. An ADP method, which grew 
out of the machine learning (or, more specifically, re- 
inforcement learning) literature, is based on two sta- 
tistical concepts concerning the conditional expec- 
tation. First, for given x and the past information 
the conditional expectation is an expectation 
and therefore can be evaluated by Monte Carlo sim- 
ulations, if one knows how /ifc(x^_|_]^), . . . , /i„_i(3;* ) 
are generated. The second concept is that, by (9), 
hi{xi^i) is a conditional expectation given J-^, which 
is a regression function (or minimum-variance pre- 
diction) of /ii(xj+i), with regressors (or predictors) 
generated from J^i. Based on a large sample (gener- 
ated by Monte Carlo), the regression function can 
be estimated by least squares using basis function 
approximations, as is typically done in nonparamet- 
ric regression. Combining least squares (LS) regres- 
sion with Monte Carlo (MC) simulations yields the 
following LS-MC method for Markov decision prob- 
lems in reinforcement learning. Let {st,t > 0} be 
a Markov chain whose transition probabilities from 
state St to st+i depend on the action xt at time t, 
and let ft{s, x) denote the cost function at time t, in- 
curred when the state is s and the action x is taken. 
Consider the statistical decision problem of choosing 
x at each stage k to minimize the cost-to-go function 

Qk{s,x) 
(14) =E\h{s,x) 



+ ^ ft{st,xt) 

t=k+l 



Sk — S, Xk — X / , 



assuming that Xk+i, ■ ■ ■ ,Xn have been determined. 
Let 

(15) Vfc(s) =minQfc(s,x), xl = avgmmQk{s,x). 
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These functions can be evaluated by the backward 
induction algorithm of dynamic programming: 
Vn{s) = miux fn{s,x), and for n > A; > 1, 

Vfc(s) =mm{fk{s,x) 

(16) 

+ E[Vk+i{sk+i)\sk = s,Xk = x]}, 

in which the minimizer yields x^. The LS-MC method 
uses basis functions (pj, 1 < j < J, to approximate 

Vfc+i by Vk+i = Y^j=iak+i,j<))j, and uses this ap- 
proximation together with B Monte Carlo simula- 
tions to approximate 

E[Vk+i{sk+i)\sk = s,Xk = x] 

for every x in a grid of representative values. This 
yields an approximation Vk to and also Xk to x^. 
Moreover, using the sample 

(17) {{sk,b,Vkisk,b))A<b<B} 

generated by the control act^ion xj., we can perform 
least squares regression of Vk{sk,b) on {(j)i{sk,b), ■ ■ ■ , 
<pj{sk,b)) to approximate Vk by Vfc = ^^^^ a^jt^j. 
Further details of this approach can be found in 
Chapter 6 of Bertsekas (2007). 

Although the problem (10) can be viewed as a 
Markov decision problem with the J^i+i-posterior 
distribution being the state st, the state space of 
the Markov chain at hand is infinite-dimensional, 
consisting of all bivariate posterior distributions of 
the unknown parameter vector (a,/3). If the state 
space were finite-dimensional, for example, M™, then 
one could approximate the value functions (15) by 
commonly used basis functions in nonparametric re- 
gression, such as regression splines and their ten- 
sor products; see Hastie, Tibshirani and Friedman 
(2001). However, in the infinite-dimensional case, 
there is no such simple choice of basis functions 
of posterior distributions, which are the states. As 
pointed out in Section 6.7 of Bertsekas (2007), an al- 
ternative to approximating the value functions Vk, 
called approximation in value space, is to approxi- 
mate the optimal policy by a parametric family of 
policies so that the total cost can be optimized over 
the parameter vector. This approach is called ap- 
proximation in policy space and most of its literature 
has focused on finite-state Markov decision prob- 
lems and gradient-type optimization methods that 
approximate the derivatives of the costs, as func- 
tions of the parameter vector, by simulation. We 
now describe a new method for approximation in 



policy space, which uses iterated rollouts to opti- 
mize the parameters in a suitably chosen parametric 
family of policies. 

The choice of the family of policies should involve 
domain knowledge and reflect the kind of policies 
that one would like to use for the actual applica- 
tion. One would therefore start with a set of real- 
valued basis functions of the state sj of the Markov 
chain with general, possibly infinitely-dimensional, 
state space, on which the family of chosen policies 
will be based. The control policies in this family can 
be represented by 'nt{4>i[st), ■ ■ ■ ,4>m{st)', P), which is 
the action taken at time t [after st has been observed 
and the basis functions (/'i(st), . . . , 4>mist) have been 
evaluated] and in which /3 is a parameter to be cho- 
sen iteratively by using successive rollouts, with 

{7rtiMst),...,<l>M;(3^'^),l<t<n} 

being the base policy for the rollout x^-^'^^). Using 
the simulated sample 

{{sk,b,xl^^t'^),l<b<B}, 
in which Sk^b denotes the 6th simulated replicate of 
Sk, least squares regression of x[:'^^^ on 7^k{4'i{sk,b), 

. . . , (pm (sfc,f)) ; /3) is performed to estimate (3 by ; 
nonlinear least squares is used if tt^ is nonlinear in (3. 
In view of (13), each iteration is expected to provide 
improvements over the preceding one. A concrete 
example of this method in a prototypical Phase I 
setting is given in the next section, where linear re- 
gression splines are used in iterated rollouts. In this 
setting the state variable st represents the complete 
treatment history up to time t in the trial — all prior 
distributions, doses and responses up to that time — 
and the cost function ft{st,x) will be replaced by 
ht{x) given by (9). 

3. HYBRID DESIGNS AS BASE POLICIES 
FOR ITERATED ROLLOUTS 

In their use of rollouts to approximate the opti- 
mum for (8) for the normal model, Han, Lai and 
Spivakovsky (2006), Section 3, used the structure of 
their problem to come up with an ingenious "per- 
turbation of the myopic rule" as a base policy to 
improve the performance of the rollout, without per- 
forming second- or higher-order rollouts. In this sec- 
tion we explore this technique in the context of Phase I 
designs, using such perturbations — called here hy- 
brid designs — both as base policies and as a way to 
represent highly complicated but efficient policies in 
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a simple, clinically useful way. As pointed out in 
Section 2.1, the objective function of the dynamic 
programming problem (8) involves both experimen- 
tation (for estimating the MTD) and treatment (for 
the patients in the study). Consider the kth patient 
in a trial of length n {>k). If the kth patient were 
the last patient to be treated in the trial {n = k), 
the best dose to give him/her would be the myopic 
dose nik that minimizes h^-iixk), given by (9). On 
the other hand, early on in the trial, especially if 
n — k is relatively large, one expects the optimal 
dose to be perturbed from ruk in the direction of 
a dose that provides more information about the 
dose-response model, for the relatively large num- 
ber of doses that will have to be set for the future 
patients. Since the optimal design theory for learn- 
ing the MTD under overdose constraints, developed 
by Haines, Perevozskaya and Rosenberger (2003), 
yields a c- or D-optimal design ^fc, we propose to 
use the following hybrid design representation of the 
optimal dose sequence: 

(18) 4 = (1 - efc)"ife + efc4, 

where is the chosen "learning design." Of course, 
any dosing policy admits the representation (18) 
with 



However, we will show that it is possible to use roll- 
outs to choose Ek of a simple form, not depending 
on x^, such that the resulting hybrid design given 
by the right-hand side of (18) is highly efficient. 
Similar ideas have been used in "e-greedy policies" 
in reinforcement learning (Sutton and Barto, 1998, 
page 122). 

From our simulation studies that include the ex- 
ample in Section 3.2, we have found that the sequen- 
tial c-optimal design (Haines et al., 2003, Section 5) 
with c being the vector (0, 1)' works well for learning 
design in (18), which we now briefly explain. In 
general, optimal designs such as c- and D-optimal 
can be characterized as optimizing some convex loss 
function ^ of the information matrix I{9,(,) associ- 
ated with the parameter value 9 and a measure on 
the space of design points (see Fedorov, 1972). Here 
(nl(0,^))~^ is interpreted as the asymptotic vari- 
ance of the MLE 9^ oi 9. The optimization problem 
can be generalized to the sequential Bayes setting, 
with prior distribution ir on 9, by finding the that 
minimizes 



at the A;th stage, where S,k~i is the empirical measure 
of the previous design points. In the case k = l, (19) 
is replaced by / '$[I{9,^)]tt{9) d9. For a given vec- 
tor c, the c-optimal design measure minimizes the 
asymptotic variance of the linear estimator c'9n of 
c'9 or, equivalents, ^[I{9,C)] = c' {I{9,C))-^c. Tak- 
ing the Bayesian c-optimal design with c = (0, 1)' as 
the learning design in (18) gives c'9 = c'{a,py = 
13, hence, this design is optimal, in some sense, for 
learning about /3 or, equivalently, about the slope 



d_ 

dx 



E{y\x) 



x=ri 



d_ 

dx 



1 



I ^ g-(a+/3x) 

j3p{l-p) 



x=ri 



(19) 



^I{e,^k~,) + Ii9,O]<0\J'k-i)d9 



of the dose response curve (1) at the MTD, for which 
p is 1/3 or some other prespecified value. This has 
the following connections to the stochastic optimiza- 
tion problem of Lai and Robbins (1979) discussed in 
Section 1 and to the rollout procedure of Han, Lai 
and Spivakovsky (2006). For the normal model dis- 
cussed in Section 1 and as an asymptotic limiting 
case of other models. Sacks (1958) showed that the 
optimal value of the step size (a user-supplied pa- 
rameter in the Lai-Robbins procedure affecting its 
convergence rate) is proportional to {d/dx)E(y\x). 
Moreover, Han, Lai and Spivakovsky (2006), Sec- 
tion 3, found that in the normal model, perturba- 
tions of the myopic policy in the direction of this 
c-optimal design provide a base design for a rollout 
that has comparable performance to that of an "or- 
acle policy." 

3.1 Relating Sk to the Uncertainty in the Bayes 
Estimate E{T]\J^k-i) 

Since the treatment versus experimentation 
dilemma discussed in Section 2 stems from the un- 
certainty in the current estimate of the MTD rj, it is 
natural to expect that the amount of perturbation 
from the myopic dose depends on the degree 
of such uncertainty, using little perturbation when 
the posterior distribution of rj is peaked, and much 
more perturbation when it is spread out. This sug- 
gests choosing function of the posterior vari- 
ance = Vav{r]\Tk-i), whose reciprocal is called 
the "precision" of i?(7/|J-fc_i) in Bayesian parlance. 
Following the approach described in Section 2.3, we 
use functions of = Vk-i/^o as basic features of 
the posterior distribution of ry to approximate the 
Ek in (18). 
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To begin, Monte Carlo simulations are performed 
to obtain the rollout x^^^ of EWOC, yielding a sim- 
ulated sample {(cfc^b, Sfc.fe)) 1 < b < B}, where is 
the 6th simulated replicate of 



(20) 



efc 



rrik 



ik - rrik 



which is essentially the same as (18) with (x^, e^) re- 
placed by {x^^\ek)- The basic idea in Section 2.3 can 
be implemented via nonparametric regression of Ck^b 
on Sfc^b, yielding the estimated regression function 
Qk- Letting = gk{sk), the hybrid design Xfe = (1 - 
^k)i^k + '^k^k can then be used as the base policy to 
form the rollout x^^), and this procedure can be re- 
peated to obtain the iterated rollouts 

Linear regression splines, and their tensor prod- 
ucts for multivariate regressors, provide a convenient 
choice of basis functions; see Section 9.4 of Hastie, 
Tibshirani and Friedman (2001). For the present 
problem, it suffices to use a truncated linear function 

gfc(s)=min{l,(/3f +/3(^)s)+} 

(21) 

for s* < s < s*, 

where and s* are the minimum and maximum 
of the sample values Sk^, ^ <b < B, and to extend 
beyond the range [s*,s*] by 



(22) gk{s) 



S5fc(s*)/s*, 0<s<s*, 
9k{s*), s>s*, 



which agrees with the constraint 5^(0) = and en- 
sures that the weight assigned to experimentation 
does not exceed gkis*)- A further simplification is 
to group the data into K blocks so that ek = Sk{s) 
does not vary with k within each block, since it is 
expected that the amount of experimentation for 
the initial stages depends mostly on the uncertainty 
about r], while for the final stages experimentation 
would only benefit the post-trial estimate of ij. 

3.2 Example and Simulation Study 

We illustrate the method in Section 3.1 by apply- 
ing it to the following example, in which n = 10 and 
[a^mini a^max] is transformed to [0,1] by location and 
scale changes. Independent uniform priors on [0, q] 
and [0,1] are used for p = FQ{x^in) and the MTD 
rj, respectively; see (2) and (3) and the sentence fol- 
lowing it. We use g = 1/3 and the EWOC loss with 
uj = 1/4 in (7), and the squared error loss g(j],rj) = 
{rj — T])'^ in (8). Since n is relatively small, we can as- 
sume for simplicity that {Pk'\f^'k^) in (21) does not 



vary with k and estimate the common (3^^^) by 
applying least squares regression to the sample 

{(efc,fe, Sk,b) ■.l<k<n,l<b< B}. 

We also simply use (21) for all s without perform- 
ing the extrapolation beyond [s*,s*]. Rolling out 
EWOC as the base design and using B = 2000 sim- 
ulations, the preceding procedure gave (/3^'^\ /3^^^) = 
(0.096,0.02). Putting 

(23) Ek = min{l, (0.096 + Omuk^i/uo)^} 
in the hybrid design 

(24) x^f}^ = {1 - £k)mk + Ekik, 

we used x^^^ as the base policy of a second roll- 
out, for which the preceding procedure yielded 
/5(^)) = (-0.72,0.94). Here we used the sequential c- 
optimal design with c = [0, 1]' as the learning design 
£k (see Section 3). Table 1 contains the operating 
characteristics, explained below, of EWOC and its 
rollout, the first hybrid design x^^) with Ek given 
by (23) and the second hybrid design x^^) in which 
(0.096,0.02) in (23) is replaced by (-0.72,0.94). Each 
result is based on 2000 simulation runs. The val- 
ues of (p,??) were generated from the prior distri- 
bution given by the joint uniform distribution on 
[0,q] X [a;mm,a;max]- Figure 1 plots the cumulative 
risk Rk = Ei=iE[hi^i{xi)] of the EWOC, rollout 
and hybrid designs for k = l, . . . , n(= 10). The oper- 
ating characteristics in Table 1 are the Monte Carlo 
estimates of overall risk Riq , the bias and root mean 
squared error (RMSE) of the terminal MTD esti- 
mate 7/10, the DLT rate P{y = 1) and the overdose 
rate OD, which is the expected proportion of pa- 
tients treated at doses higher than rj. Standard er- 
rors are given in parentheses. 

The first hybrid design, which is an approxima- 
tion to the rollout design, provides more than 10% 
improvement in terminal risk Riq over the myopic 
policy. The second hybrid design provides an ad- 
ditional 5% improvement in the terminal risk Rio, 
and also smaller values of the DLT and OD rates 
than the rollout design. The Monte Carlo simula- 
tions used to evaluate the operating characteristics 
and to fit the hybrid designs were performed by us- 
ing rejection sampling to simulate from the poste- 
rior distribution. At each stage, the posterior dis- 
tribution of {p,r]) is continuous and supported on 



the compact set [0, < 



X \x 



mini '^maxj j 



hence, the joint 



uniform distribution on [0,q] x [xmin, j;max] is a nat- 
ural candidate for the instrumental distribution in 
rejection sampling; see also the last paragraph of 
Section 5. 
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Table 1 

Risk, bias and RMSE of the final MTD estimate, DLT rate and overdose rate (OD) of 
EWOC, rollout (ROLL) of EWOC, and 1st and 2nd hybrid approximations 



Design 



Risk 



Bias 



RMSE 



DLT 



OD 



EWOC 0.84 (0.01) 

ROLL 0.75 (0.01) 

Hybrid 1 0.75 (0.02) 

Hybrid 2 0.71 (0.01) 



-0.20 (0.010) 0.31 (0.04) 29.8% (0.7%) 21.9% (0.6%) 

-0.04 (0.009) 0.22 (0.03) 33.0% (0.7%) 31.2% (0.7%) 

-0.14 (0.012) 0.29 (0.06) 33.5% (1.5%) 37.5% (1.5%) 

-0.04 (0.005) 0.22 (0.04) 31.24% (0.9%) 27.8% (0.9%) 



0.55 - 



0.45 



0.35 




0.25 - 



0.15 - 



0.75 



R 




k 0.65 - 



0.55 



Fig. 1. Risk for EWOC, rollout of EWOC and hybrid designs. 



4. A TWO-STAGE MODIFICATION AND 
COMPARATIVE STUDY 

Babb et al. (1998) used EWOC to design a Phase I 
trial to determine the MTD, with p= 1/3, of the an- 
timetabohte 5-fluorouracil (5-FU) for the treatment 
of sohd tumors in the colon, when taken in conjunc- 
tion with fixed levels of the agents leucovorin (20 
mg/m^) and topotecan (0.5 mg/m^). In this set- 
ting, a toxicity is considered a grade 4 hematologic 
or grade 3 or 4 nonhematologic toxicity within 2 
weeks. As mentioned above, EWOC involves speci- 
fying pre-trial a set 



(25) 



Al < A2 < • • • < Xr 



of possible dose values believed to contain the MTD, 
where Xmin is taken as the starting value. Based 
on preliminary studies of 5-FU given in conjunction 
with topotecan, a dose of Xmm = 140 mg/m^ of 5-FU 
was believed to be safe when given with 0.5 mg/m^ 
of topotecan. Also, a previous trial concluded that 
the MTD of 5-FU was 425 mg/m^ when admin- 
istered without topotecan, so Xmax was taken to 
be 425 mg/m^ since 5-FU has been observed to be 
more toxic when given with topotecan than alone. 
The two-parameter logistic model (1) was chosen 
based on previous experience with the agents, and 
uniform prior distributions over [xmirn 2;max] and 
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[0,0.2] were chosen for the MTD and the proba- 
bihty Fg(xram), respectively. A feasibihty bound of 
00 = 0.25 was chosen for EWOC and p = 1/3. We 
compare EWOC and other previous designs with 
the rollout (abbreviated by ROLL) of EWOC and 
the Hybrid 1 design described in Section 3.2 in this 
setting with n = 24. 

To give a feel for the computational time required 
for the EWOC, ROLL and Hybrid 1 designs, on a 
desktop personal computer with a 2.66 GHz pro- 
cessor, the simulation of a single n = 24 run of the 
ROLL design with the EWOC base design took 49 
minutes, whereas the Hybrid 1 design took 0.4 sec- 
onds and EWOC took 0.12 seconds. The Hybrid 1 
design is computationally much simpler than ROLL 
since it does not perform rollouts of a base design, 
but rather calculates its dose via (24), where mk is 
the EWOC dose and ik is the sequential c-optimal 
learning design. So although the interpolation func- 
tion (23) is derived from data gathered by ROLL 
during its rollouts as described in 3, the Hybrid 1 de- 
sign has computational time on the order of EWOC 
and the learning design i^, even though the compu- 
tational time required for ROLL is large. 

4.1 A Comparative Study 

Table 2 first lists Bayesian designs, followed by 
non-Bayesian designs that include Wu's (1985) de- 
sign, stochastic approximation (Lai and Robbins, 
1979) and two 3-plus-3 dose escalation designs. The 
first 3-plus-3, denoted by 3 + 3io, uses 10 uniformly- 
spaced dose levels in [xmin, Xmax] = [140,425]. The 
second uses 20 uniformly-spaced dose levels and is 
denoted by 3 + 32o- Besides EWOC and its rollout 
ROLL, the Bayesian designs include CRM, the con- 
strained Z^-optimal design (abbreviated by D-opt) 
of Haines et al. (2003) with constraint e = 0.05 and 
the unconstrained sequential Bayesian c-optimal de- 
sign (abbreviated by c-opt) with c being the vector 
(0,1)T. The prior density is assumed to be uniform: 

(26) 

• l{{p,7]) G [0,^] X [Xmm,2;max]} 

with q = 0.2, where l(^) denotes the indicator of a 
set A. The values of (p, ?]) were generated from the 
prior distribution (26). 

The performance of these designs is first evaluated 
in terms of the global risk (8), in which we use the 
squared error g{r],r]) = {rj — rj)'^ for the MTD esti- 
mate r] = r]{xi,yi, . . . ,Xn, Un)- We then evaluate per- 
formance exclusively in terms of the bias and root 



mean squared error (RMSE) of rj without taking 
into consideration the risk to current patients, not- 
ing that the c- and D-optimal designs focus on er- 
rors of post-trial parameter estimates. Finally, since 
safety of the patients in the trial is the primary con- 
cern of traditional 3-plus-3 designs, performance is 
also evaluated in terms of the DLT rate and the 
probability of overdose (i.e., dose level exceeding the 
MTD). Each result in Table 2 is based on 2000 sim- 
ulations. 

The results in Table 2 show that the effects of con- 
sidering the "future" patients is large, with ROLL 
and Hybrid 1 substantially reducing the global risk 
from the myopic designs: in the case of ROLL, about 
30% from EWOC, 35% from CRM, and more from 
the 3-plus-3, c- and -D-opt, SA and Wu designs. Al- 
though ROLL has somewhat smaller global risk than 
Hybrid 1, it is computationally much more expen- 
sive, as noted above. The results for the 3-plus-3 
designs show that they are highly sensitive to the 
choice of Al, A2, . . . in (25). The 3 + 3io design, using 
10 uniformly-spaced levels in [xmim a^max] j performs 
perhaps surprisingly well, as it even has smaller risk 
than D-opt, which suffers from substantial under- 
dosing due to its overdose constraint of e = 0.05. 
This seems to be because the number 10 of dose lev- 
els was a fortuitous choice given the parameter val- 
ues and sample size of this study, allowing the 3 + 3io 
design to escalate to near the MTD in most cases. 
However, there is often little information about the 
appropriate number of doses and scale before a 
Phase I cancer trial begins, and when dose levels 
are chosen over a less fortuitous range or on too 
fine a scale, as with the 3 + 32o design, the major- 
ity of doses can end up being administered at levels 
far below the therapeutic range near the MTD. We 
emphasize that all of these designs are being evalu- 
ated using the EWOC loss function in (7) which, in 
particular for CRM, differs from its associated loss 
function; using the CRM loss function in (7) results 
in CRM having smaller global risk than EWOC, 
but again ROLL (with CRM as its base design) 
yields smaller global risk than both and the rela- 
tionship between the other designs remains roughly 
unchanged. 

In terms of MTD estimation accuracy, CRM and 
Wu have the smallest RMSE, closely followed by c- 
opt and ROLL; CRM and Wu also have the small- 
est absolute bias. It is interesting to note that the 
designs which explicitly account for the asymmet- 
ric underdose/overdose relationship, that is, ROLL, 
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Table 2 



Risk, bias 


and RMSE of the 


final MTD estimate, 


DLT rate and MTD oi 
various designs 


lerdose rate (OD), with SEs in j 


oarentheses, of 


Design 


Risk 


Bias 


RMSE 


DLT 


OD 


ROLL 


0.81 (0.01) 


-0.069 (0.002) 


0.126 (0.022) 


27.68% (1.70%) 


29.17% (1.87%) 


Hybrid 1 


0.92 (0.03) 


-0.075 (0.003) 


0.128 (0.028) 


24.68% (0.86%) 


23.48% (0.68%) 


EWOC 


1.13 (0.01) 


-0.076 (0.003) 


0.138 (0.024) 


26.17% (0.98%) 


19.69% (0.89%) 


CRM 


1.65 (0.01) 


0.037 (0.003) 


0.118 (0.021) 


36.37% (1.10%) 


62.69% (1.10%) 


c-opt 


1.71 (0.01) 


0.060 (0.003) 


0.126 (0.022) 


23.44% (0.95%) 


12.42% (0.74%) 


D-opt 


1.96 (0.02) 


-0.084 (0.006) 


0.143 (0.023) 


13.55% (0.31%) 


3.78% (0.17%) 


Wu 


1.77 (0.04) 


0.038 (0.009) 


0.122 (0.045) 


23.40% (0.54%) 


40.25% (0.77%) 


SA 


1.52 (0.02) 


0.063 (0.003) 


0.131 (0.022) 


22.39% (0.93%) 


35.56% (0.40%) 


3 + 3io 


1.87 (0.01) 


0.060 (0.003) 


0.138 (0.024) 


17.06% (0.84%) 


0.85% (0.21%) 


3 + 320 


2.19 (0.02) 


0.070 (0.002) 


0.161 (0.025) 


14.11% (0.81%) 


0.75% (0.29%) 



If first dose 140 is nontoxic 



211 
237 
242 






toxic 


nontoxic 




toxic 


nontoxic 




149 


183 




208 


261 




163 


207 




227 


276 




164 


210 




212 


263 


toxic 


nontoxic 


toxic nontoxic 


toxic 


nontoxic 


toxic nontoxic 


143 


154 


164 198 


186 


221 


237 277 


U5 


164 


177 214 


197 


233 


242 284 


140 


174 


191 213 


207 


217 


227 266 



Fig. 2. The first five dose levels given by EWOC, ROLL (in italics) and Hybrid 1 (m bold) m the 5-FU trial. 



EWOC and D-opt (through its overdose constraint e), In terms of safety, D-opt and the 3 + 3 designs 
are negatively biased, while the others all have pos- have the smallest DLT and OD rates, but in view 
itive bias. of their large risk values, this safety comes at the 
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cost of low doses that are nontherapeutic. CRM 
has high estimation accuracy and moderate risk, 
but also the largest DLT and OD rates because of 
its symmetric loss function. The remaining designs, 
ROLL, EWOC, Wu and SA, ah have comparable 
DLT and OD rates, but their risk values suggest 
that the magnitude of the overdoses in Wu and SA 
are larger than EWOC, which, in turn, has larger 
overdoses than ROLL. 

Of particular concern in phase I trials is coherence 
of the design (Cheung, 2005), that is, whether the 
next patient will be given a higher dose if the current 
patient experiences a toxicity, and a lower dose if the 
current patient does not. While a theoretical inves- 
tigation of the coherence of the ROLL and Hybrid 
designs is beyond our scope here, as an illustrative 
example Figure 2 lists the first five doses given by 
EWOC, ROLL and Hybrid 1 in the 5-FU trial set- 
ting, assuming a nontoxic response to the first dose 
of 

2^ mill — 140. Note that coherence is exhibited by 
all three designs in this example. 

4.2 A Two-Stage Design 

When one may have concerns about the validity of 
the Bayesian parametric model in this model-based 
approach, one can readily incorporate the hybrid de- 
signs as the second stage of a two-stage design. The 
first stage of such escalates the doses cautiously by 
using a modified 3-plus-3 design. For the batches 
of 3 in the 3-plus-3 design, we propose to combine 
the nonparametric step-up/down approach with a 
parametric model-based dose determining scheme, 
thereby checking the parametric model to be used 
for model-based escalation in the second stage. This 
modification of the traditional 3-plus-3 design uses a 
specified set of dose levels (25). Set di = Ai = Xmin- 
In the A;th group of 3 patients, 2 patients are treated 
at the same dose dk = Xj and 1 patient at the EWOC 
dose nik, computed given the doses and responses of 
the previous 3{k — 1) patients. If no DLT occurs in 
the group of 3 patients, d^+i is increased to Xj+i- If 
1 DLT occurs, dk+i stays the same at dk = Xj- Oth- 
erwise, 2 or 3 DLTS have occurred, so the trial is 
stopped if dk = a^mim and otherwise continues with 
dk+i lowered to (Alternatively, it may be de- 

sired to stop when 3 toxicities occur, regardless of 
what dk was.) The EWOC dose ruk+i is updated 
when the process is repeated with the next group 
of 3 patients. This process repeats until a certain 
fraction of the total number n of patients has been 
treated, provided the trial has not been stopped at 



the first stage due to excess toxicities. We have found 
from our simulation studies that switch-over points 
around n/3 or n/4 seem to strike a balance between 
enough time for conservative dose escalation and 
model checking during the first stage, while leav- 
ing enough time for efficient dose escalation in the 
second stage. 

The benefit of a first stage of conservative dose 
escalation occurs when, unlike in Table 2, the prior 
distribution of the MTD is misspecified. For exam- 
ple, if the true MTD falls in the left tail of the prior 
distribution of rj, then the prior information about 
the MTD is biased upward, which can cause over- 
doses. In this situation, including an initial stage of 
modified dose escalation, like the modified 3-plus- 
3 scheme, provides additional safety by refining the 
prior to be more accurate when it begins to be used 
in the second stage. Focusing on the CRM, EWOC, 
ROLL and Hybrid 1 designs. Table 3 contains the 
results of a simulation study that considers a sit- 
uation such as this, where the true MTD is the 
lower 15th percentile of the MTD's nominal uniform 
prior distribution on [xmin^a^max]- That is, the data 
are generated with rj fixed at the 15th percentile of 
[a^min, a^max] and p Uniformly distributed over [0,q], 
with q = 0.2 as in Table 2. The nominal prior for 
(p, 77) used by the Bayesian procedures in Table 3 
is (26), the same as in Table 2, as are the values 
of the other parameters. To see the effects of the 
first stage of more conservative dose escalation, the 
operating characteristics of ROLL are recomputed 
using a first stage of length n/4 = 6; the dose lev- 
els (25) used by the modified 3-plus-3 design are 10 
uniformly-spaced levels in [xmin, Xmax 

] = [140,425]. 

Adding this first stage to ROLL or Hybrid 1 sub- 
stantially reduces the risk, DLT and overdose rates, 
as shown in Table 3, in which (a) refers to the case 
of n = 24 dose levels without the modified 3-plus-3 
first stage, and (b) refers to the two-stage design us- 
ing a first stage of length n/4 = 6 consisting of the 
modified 3-plus-3 design. 

5. CONCLUSION 

Despite their shortcomings and the development 
of alternative Bayesian approaches since 1990, con- 
ventional dose-escalation designs are still widely used 
in Phase I cancer trials because of the ethical is- 
sue of safe treatment of patients currently in the 
trial. However, a Phase I design also has the goal 
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Table 3 

Risk, bias and RMSE of the final MTD estimate, DLT rate and MTD overdose rate (OD), with SEs in parentheses, of 
various designs with the MTD fixed at the lower 15th percentile of the misspecified prior 



Design 




Risk 


Bias 


RMSE 


DLT 


OD 


ROLL 


(a) 


1.64 (0.02) 


-0.031 (0.003) 


0.142 (0.025) 


30.32% (1.03%) 


39.38% (1.09%) 




(b) 


L39 (0.02) 


-0.025 (0.003) 


0.145 (0.026) 


27.41% (1.00%) 


33.39% (1.05%) 


Hybrid 1 


(a) 


1.82 (0.05) 


-0.032 (0.002) 


0.151 (0.027) 


36.90% (1.52%) 


42.31% (1.56%) 




(b) 


1.69 (0.04) 


-0.027 (0.003) 


0.131 (0.036) 


35.70% (1.51%) 


41.11% (1.56%) 


EWOG 




2.29 (0.02) 


-0.034 (0.003) 


0.155 (0.028) 


35.33% (1.07%) 


45.98% (1.11%) 


CRM 




3.83 (0.02) 


0.037 (0.004) 


0.179 (0.032) 


44.18% (1.11%) 


65.12% (1.07%) 



of determining the MTD for a future Phase II can- 
cer trial, and needs an informative experimental de- 
sign to meet this goal. Von Hoff and Turner (1991) 
have documented that the overall response rates in 
Phase I trials are low and that substantial num- 
bers of patients are treated at doses that are retro- 
spectively found to be nontherapeutic. Eisenhauer 
et al. (2000), page 685, have pointed out that "with 
a plethora of molecularly defined antitumor targets 
and an increasingly clear description of tumor biol- 
ogy, there are now more antitumor candidate ther- 
apies requiring Phase I study than ever," and that 
"unless more efficient approaches are undertaken. 
Phase I trials may be a rate-limiting step in the pro- 
cess of evaluation of novel anticancer agents." The 
hybrid designs in the previous section were moti- 
vated by developing one such "more efficient" ap- 
proach. 

Hybrid designs with simple interpolation functions, 
refined through iterated rollouts and regression, can 
be implemented by using simple look-up tables for 
the parameters in (21), and thus can be relatively 
simple to use for clinicians. Given computer pack- 
ages to compute the standard myopic and learning 
designs, practitioners can use a look-up table for the 
values Ek in (18) as a function of the relative poste- 
rior standard deviation Vk-i/vQ. For given values of 
the prior parameters Xmin) iCmaxjP) 9 and w, a com- 
puter package can generate this look-up table, which 
can be used at every stage of the trial. We are in the 
process of developing open source software for this 
purpose. 

Tighiouart, Rogatko and Babb (2005) have shown 
how Markov chain Monte Carlo (MCMC) can be 
used to compute the posterior distribution of (p, iff) 
when the prior distribution is supported on [0, q] x 



[xmiruoo), extending the model considered above 
where the support of rj is bounded above by Xmax- 
They note that priors in this class with a negative 
correlation structure between p and r] result in an 
EWOC design with comparable accuracy for esti- 
mating the MTD but lower DLT and OD rates, 
relative to its performance for priors supported on 
[0,g] X [xniin,a;max]- As noted in Section 4, a two- 
stage design can easily address the higher DLT and 
OD rates caused by misspecifications of such pri- 
ors. On the other hand, even without a cautious 
first stage, the above and other generalizations of 
the prior of (/0,r/) can be seamlessly incorporated 
into our hybrid design. In fact, the model M4 of 
Tighiouart et al. (2005), which has been shown to 
perform well in their simulation studies, has a left- 
truncated, hierarchical normal prior distribution on 
rj, so the rejection sampling approach in the last 
paragraph of Section 3.2 can be applied here by us- 
ing, say, the exponential distribution as the instru- 
mental distribution, since its tails are upper bounds 
of those of the normal distribution. We can there- 
fore still use the Monte Carlo approach laid out at 
the end of Section 3.2. 
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